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Abstract-  Supporting  large  volumes  of  multi-dimensional  data  is  an  inherent  characteristic  of 
modem  database  applications,  such  as  Geographical  Information  Systems  (GIS),  Computer  Aided 
design  (CAD),  and  Image  and  Multimedia  Databases.  Such  databases  need  underlying  systems 
with  extended  features  like  query  languages,  data  models,  and  indexing  methods,  as  comparedto 
traditional  databases,  mainly  because  of  the  complexity  of  representing  and  retneving  data.  The 
presented  work  deals  with  access  methods  for  databases  that  accurately  model  the  real  world 
More  precisely,  the  focus  is  on  index  structures  that  can  capture  the  time  varying  nature  ot 
moving  objects,  namely  spatio-temporal  structures.  A  new  taxonomy  to  classify  these  structures 
has  been  defined  according  to  dataset  characteristics  and  query  requirements.  Then,  a  new  spatio- 
temporal  access  method,  the  2-3TR-tree,  has  been  designed  to  process  specific  datasets  and  fulfill 
specific  query  requirements  that  no  other  existing  spatio-temporal  index  could  hand  e. 

Key  Words  -  Spatio-temporal,  indexing,  R-tree,  Taxonomy. 

1 .  Spatio-Temporal  R-tree-based  Structures 

The  primary  goal  of  a  spatio-temporal  databases  [LASA02]  is  the  accurate  modeling  of  the 
real  world;  that  is  a  dynamic  world,  which  involves  objects  whose  position,  shape  and  size 
change  over  time.  Real  life  examples  that  need  to  handle  spatio-temporal  data  include  storage  and 
manipulation  of  ship  and  plane  trajectories,  fire  or  hurricane  front  monitor  and  weather  forecast. 
Geographical  information  systems  are  also  a  source  for  spatio-temporal  data.  Thus,  spatio- 
temporal  databases  are  used  in  many  applications;  therefore,  in  this  section  we  are  mterested  m 
structures  that  can  index  efficiently  such  databases.  We  deal  with  data  points,  since  in  many 
applications  the  size  and  shape  of  an  object  do  not  matter,  only  its  position  does. 

To  construct  a  spatio-temporal  index  structure  the  approach  in  current  research  is  to 
extend  existing  spatial  access  methods  in  order  to  keep  their  spatial  properties.  There  are  two 
major  trends  considering  the  indexing  of  spatial  data.  The  first  trend  is  to  use  space-driven 
structures:  these  are  indices  based  on  the  partitioning  of  the  embedding  two-dimensional  space 
into  rectangular  cells,  independently  of  the  distribution  of  the  objects  in  the  2D  plane.  The  most 
popular  space-driven  structures  are  the  Quadtree  and  its  variants  [FIBE74]. 
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The  second  trend  is  to  use  data-driven  structures  to  index  spatial  data.  Data-driven 
structures  are  organized  by  partitioning  the  set  of  objects,  and  not  the  embedding  space.  The  most 
used  data  driven  structures  are  the  R-tree  [GUTT84]  and  its  variants  [SELL87,  BECK90,  and 
KAFA94].  R-trees  are  generally  more  efficient  for  non-uniform  data  than  Quadtrees.  Indeed,  a 
concentration  of  data  points  in  the  2D  space  results  in  a  very  long  branch  in  a  Quadtree,  which  in 
turn  increases  drastically  the  processing  time  for  insertion,  deletion  and  spatial  searching  in  this 
area.  This  does  not  happen  in  R-trees  since  they  are  height-balanced.  Therefore,  we  will  focus 
on  R-tree-based  structures. 

In  this  section,  we  first  define  a  taxonomy  of  spatio-temporal  access  methods  that  brings  out 
the  different  dimensions  to  be  considered  in  designing  a  spatio-temporal  index  structure.  Then, 
we  review  the  existing  R-tree-based  structures  that  handle  spatio-temporal  data  and  classify  them 
according  to  the  newly  defined  taxonomy. 

2.Taxonomy 

The  taxonomy  proposed  in  this  paper  is  made  up  of  the  different  dimensions  that  have  to  be 
considered  in  the  design  and  evaluation  of  a  spatio-temporal  access  method.  These  are  eight 
distinct  dimensions  that  are  related  to  the  dataset  characteristics,  the  query  requirements, 
discrimination,  insertion/split  strategy  and  version  duplication.  It  should  be  noted  that,  to  our 
knowledge,  there  exists  only  one  other  taxonomy  for  spatio-temporal  structures  that  has  been 
reported  in  [TS98].  The  classification  carried  out  in  [TS98]  does  not  include  as  many  spatio- 
temporal  structures  as  those  presented  in  this  paper,  and  does  not  cover  all  the  dimensions 
specified  by  our  new  taxonomy.  As  a  consequence,  some  existing  spatio-temporal  indices  cannot 

be  differentiated  using  the  taxonomy  in  [TS98]. 

The  eight  dimension  distinct  dimensions  of  our  taxonomy  are  presented  below: 

Dataset  Characteristics 

1.  Time  Dimension 

Different  temporal  aspects  can  be  considered  when  storing  temporal  data: 

Valid-time:  This  is  the  time  when  the  fact  is  true  in  the  modeled  reality.  The  valid-time 
of  a  fact  can  in  the  past,  or  in  the  future  and  can  be  changed  freely. 

Transaction-time:  This  is  the  time  when  the  fact  is  current  in  the  database  and  may  be 
retrieved.  The  transaction-time  of  a  fact  cannot  extend  beyond  the  current  time  nor  can  it 
be  changed.  Insertions  and  deletions  can  only  happen  at  the  current  time,  in  other  words 
there  is  no  update  in  the  past.  Moreover,  deletions  are  logical:  the  actual  records  are  not 
physically  removed  from  the  database. 

Bitemporal  data:  Data  with  both  valid  and  transaction-time. 
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2.  Time  Evolution 

Datasets  can  be  constituted  of  positions  or  parts  of  trajectories  (that  can  be  interpolated). 
Therefore,  two  aspects  of  time  evolution  can  be  considered: 

Discrete:  Only  positions  of  objects  at  different  timestamps  are  stored.  No  information 
can  be  retrieved  between  two  timestamps,  or  only  the  position  at  the  previous  timestamp. 
Pontinnous:  Continuous  in  this  context  does  not  mean  that  we  can  retrieve  the  exact 
position  of  an  object  at  any  timestamp.  Instead,  it  means  that  trajectories  or  trajectory 
segments  are  stored.  They  may  be  interpolated,  i.e.  the  positions  of  an  object  are 
recorded,  and  the  trajectory  between  them  is  interpolated  (in  most  cases  linearly). 
Therefore,  the  database  can  be  queried  for  any  value  of  time. 

3.  Data  Evolution 

This  dimension  deals  with  the  mobility  in  time  and  in  space  (location)  of  the  data  objects. 
The  dataset  can  be: 

Moving:  The  cardinality  of  the  database,  i.e.  the  number  of  object  instances,  does  not 
change  over  time,  but  the  database  objects  can  change  their  location. 

Growing:  The  cardinality  of  the  database  can  change,  but  the  objects  stay  at  the  same 

location. 

Fully  Evolving:  The  cardinality  of  the  database  can  change  and  the  objects  can  change 
location. 

4.  Data  Acquisition 

There  are  different  ways  for  the  object  instances  to  be  inserted  in  the  database  or  updated. 
The  acquisition  of  the  data  can  be: 

Static:  The  data  is  known  before  hand  and  no  insertions  or  updates  are  allowed. 
Chronological:  Only  current  instances  are  dynamically  inserted  or  updated.  In 
transaction-time  databases,  if  data  acquisition  is  not  static,  it  is  chronological  and  past 
cannot  be  changed. 

Dynamic:  Insertions  and  updates  to  any  object  referred  to  any  timestamp  are  allowed. 
Query  Requirements 

5.  Query  Types 

In  addition  to  classical  spatial  range  queries,  different  types  of  queries  may  be  required 
by  an  application.  Based  on  this,  an  application  can  be: 

Historical  information  oriented:  In  this  type  of  applications,  access  methods  aim  at  the 
retrieval  of  historical  spatio-temporal  information  about  objects  that  evolve.  This 
involves  queries  of  the  form  “Find  all  objects  that  have  lied  within  a  specific  area  (or  at  a 
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specific  point),  during  a  specific  time  interval  (or  at  a  specific  timestamp)”.  Time  slice  or 
timestamp  queries  retrieve  all  objects  that  intersect  a  window  at  a  specific  timestamp. 
Time  interval  queries  include  several  (usually  consecutive)  timestamps. 

Trajectory  oriented:  Certain  applications  need  to  store  motion  patterns  of  objects.  The 
goal  in  this  case  is  to  retrieve  the  line  segments  or  position  points  representing  the 
trajectories  of  certain  objects.  We  will  especially  focus  on  the  retrieval  of  trajectories 
carried  out  by  selecting  a  segment  of  the  trajectory  using  a  spatio-temporal  range,  and 
then  identifying  the  remaining  segments  or  points  that  are  part  of  the  trajectory. 

Combined:  This  type  of  applications  combines  both  historical  information  retrieval  and 

trajectory  retrieval. 

Spatial/Temnoral  Discrimination 
6. Discrimination 

The  discrimination  capability  of  an  access  method  deals  with  the  workspace 
dimensions/properties  used  to  index  and  retrieve  the  data  in  the  index  structure.  The  more 
discrimination  there  is,  the  better  the  range  query  performance  is.  In  the  context  of 
spatio-temporal  access  methods,  there  are  three  possible  levels  of  discrimination. 

Spatial:  Data  is  indexed  and  retrieved  according  to  spatial  dimensions. 

Temporal:  Data  is  indexed  and  retrieved  according  to  the  time  dimension. 

Spatio-temporal:  Data  is  indexed  and  retrieved  according  to  both  spatial  and  time 

dimensions. 

Insertion/Svlit 

7. Insertion/Split  Strategy 

This  dimension  deals  with  the  construction  of  the  index  structure.  It  represents  the 
strategy  that  is  applied  when  choosing  the  insertion  path  or  splitting  a  node  that 
overflows.  The  strategy  adopted  by  an  index  structure  plays  a  role  in  query  processing 
and  affects  the  performance  of  certain  query  types.  There  are  three  different  strategies 
that  can  be  emphasized: 

t  oaet  Tnlarpomont:  To  insert  a  new  entry  in  the  index,  the  path  is  chosen  with  respect  to 
the  least  enlargement  criterion.  In  other  words,  we  follow  the  nodes  whose  MBBs  need 
the  least  enlargement  to  insert  the  new  entry.  Similarly,  the  entries  of  an  overflowing 
node  are  split  into  two  nodes  according  to  the  least  enlargement  criterion.  This  is  the 

strategy  used  by  the  R-tree. 
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Minimum  Overlap:  The  insertion  and  split  algorithms  assign  an  entry  to  the  nodes  that 
result  in  a  minimum  overlap  between  the  MBBs  of  the  nodes  at  the  same  level.  Since 
there  is  less  overlap  than  with  the  least  enlargement  strategy,  fewer  paths  may  need  to  be 
searched  for  historical  information  retrievals,  and  thus,  search  performance  is  increased. 
This  is  the  strategy  used  by  R* -trees  [BECK90], 

T  .inear  Ordering:  The  insertion  and  split  algorithms  assign  an  entry  according  to  a  linear 
ordering  defined  by  a  space  filling  curve.  This  strategy  improves  node  utilization,  and 
therefore  historical  information  retrievals.  So  far,  the  Minimum  Overlap  and  the  Linear 
Ordering  are  the  most  efficient  insertion/split  strategies. 

Trajectory  Preservation:  The  insertion  and  split  algorithms  group  the  entries  according  to 
the  trajectories  they  belong  to.  This  strategy  improves  trajectory  retrievals  to  the 
detriment  of  historical  information  retrievals  since  it  usually  involves  more  overlap. 

Version  Duplication 

8 .  Version  Duplication 

This  dimension  indicates  whether  the  index  structure  contains  multiple  copies  of  the  same 
object  at  the  same  positions  but  at  different  timestamps.  This  phenomenon  usually  occurs 
with  overlapping  and  multi-version  techniques.  Version  duplication  is  to  be  considered  if 
storage  requirement  is  an  important  parameter  for  the  application  that  uses  the  access 
method.  Indeed,  data  duplication  will  increase  the  size  of  the  index  structure.  Obviously, 
the  two  possible  values  for  version  duplication  are: 

Dunlication:  Version  duplication  is  involved. 

No  Duplication:  No  version  duplication  is  involved 

We  now  classify  existing  spatio-temporal  R-tree  based  data  structures  according  to  the 
previously  defined  taxonomy.  As  stated  previously,  we  focus  on  R-tree-based  structures. 
However,  it  should  be  noted  that  currently  there  is  one  spatio-temporal  index  based  on  the 
Quadtree:  the  Overlapping  Linear  Quadtree  [TZ098].  Regarding  the  R-tree-based  structures, 
there  is  a  limited  number  of  proposals:  the  RT-tree  [XULU90],  the  3D  R-tree1  [VAZ96],  the  2+3 
R-tree  [NST99],  the  HR-tree  [NAS98],  the  MV3R-tree  [YUPAOO],  and  the  STR-tree  and  the  TB- 
tree  [PJTOO].  Table  1  classifies  these  existing  R-tree  structures  using  our  newly  defined  taxonomy. 

Table  2  shows  that  existing  spatio-temporal  indices  cannot  handle  application  categories 
that  have  the  following  characteristics:  valid-time  database,  dynamic  data  acquisition,  fully 
evolving,  retrieval  of  both  historical  information  and  trajectories  ( combined  queries),  spatio- 
temporal  discrimination  and  a  minimum  overlap  or  linear  ordering  insertion/split  strategy. 

1  It  is  noted  that  3  versions  of  the  3D-Tree  are  presented  in  [VAZ96]. 
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Table  2.  Evaluation  of  Existing  Spatio-Temporal  Structures 


Requirement 

Structure  \ 

Valid-Time 

Fully 

Evolving 

Data 

Dynamic 

Data 

Acquisition 

Combined 

Queries 

Spatio-Temporal 

Discrimination 

Minimum 

Overlap 

or  Linear 

Ordering 

Strategy 

RT-tree 

No 

Yes 

No 

No 

No 

Yes 

3D  R-tree 

(version  1) 

Yes 

No 

No 

Yes 

Yes 

Yes 

3D  R-tree 

(version  2) 

Yes 

Yes 

No 

No 

Yes 

Yes 

3D  R-tree 

(version  3) 

Yes 

Yes  j 

No 

No 

Yes 

Yes 

2+3R-tree 

Yes 

Yes 

Yes 

No 

Yes 

Yes 

HR-tree 

No 

Yes 

No 

No 

Yes 

No 

MV3R-tree 

No 

Yes 

No 

No 

Yes 

Yes 

STR-tree 

No 

Yes 

No 

Yes 

Yes 

No 

TB-tree 

No 

Yes 

No 

No 

No 

No 

It  is  emphasized  that  many  application  categories  of  critical  importance  have  the  above 
requirements,  and  thus  cannot  be  handled  by  existing  spatio-temporal  structures. 

3.  The  2-3  Trajectory  R-tree 

This  paragraph  proposes  a  spatio-temporal  access  method,  the  2-3  Trajectory  R-tree  (2- 
3TR-tree),  which  meets  the  requirements  defined  above.  The  2-3TR-tree  is  based  on  the  2+3R- 
tree  and  the  TB-tree.  Like  the  2+3  R-tree,  the  2-3TR-tree  consists  of  two  R-tree  indices.  Any 
derivative  of  the  R-tree  can  be  used.  More  particularly,  two  R*-trees  or  two  Hilbert  R-trees 
[KAFA94]  can  be  used, 


7 


The  2-3TR-tree,  A  trajectory-Oriented  Index  Structure  for  Fully  Evolving  Valid-Time  Spatio-Temporal 
Datasets 

which  allows  applying  a  minimum  overlap  or  linear  ordering  insertion/split  strategy.  In  the 
following  we  refer  to  these  indices  as  R-trees. 

In  the  2-3TR-tree,  one  R-tree  indexes  two-dimensional  points,  and  the  other  one  indexes  three- 
dimensional  points  or  lines.  The  two-dimensional  points  represent  the  current  spatial  information 
about  the  data,  i.e.  the  current  objects’  positions.  The  current  positions  are  kept  along  with  their 
start  time.  The  three-dimensional  points  or  lines  represent  the  historical  information  about  the 
data.  A  three-dimensional  point  corresponds  to  a  position  that  was  held  by  an  object  during  only 
one  timestamp.  A  three-dimensional  line  represents  a  position  that  was  held  during  a  time 
interval.  Thus,  the  indexed  data  are  not  trajectory  segments  but  the  objects’  positions.  As  a  result, 
there  is  no  dead  space  involved  by  moving  objects  like  in  the  3D  R-tree  versions  1  and  2. 

Therefore,  the  2-3TR-tree  handles  fully  evolving  data. 

While  the  end  time  of  an  object’s  position  is  unknown,  it  is  indexed  under  the  two- 
dimensional  R-tree.  Once  the  end  time  of  an  object's  current  position  is  known,  the  position  is 
inserted  in  the  three-dimensional  R-tree  along  with  its  start  and  end  times,  and  removed  from  the 
two-dimensional  R-tree.  Therefore,  no  open  cube  is  created  when  an  object  whose  movement  is 
unknown  is  inserted.  Moreover,  data  can  be  inserted  or  updated  in  past  states.  As  a  consequence, 
the  2-3TR-tree  allows  dynamic  data  acquisition  and  handles  valid-time  databases.  Furthermore, 
the  three-dimensional  R-tree  offers  spatio-temporal  discrimination. 

Finally,  an  additional  difference  with  the  2+3R-tree  is  that  when  a  point  (x0,  y0)  indexed 
under  the  two  dimensional  R-tree  changes  position  at  tj;  the  segment  [(xo,  yo,  k)  (xo,  yo,  tj-i)]  is 
inserted  in  the  three-dimensional  R-tree,  instead  of  the  segment  [(xo,  yo,  k)  (x0,  yo,  tj)].  This 
slightly  reduces  overlap  (at  tj  the  object  is  indexed  only  once)  and  does  not  involve  any  loss  of 

information  since  we  consider  discretely  moving  points. 

The  data  structure  of  the  2-3TR-tree  is  based  on  that  of  the  TB-tree.  Each  position  held 
by  a  data  object  is  represented  by  an  object,  FeatureSnapshot,  which  contains  the  position  s 
spatial  coordinates  as  well  as  its  time  information.  The  time  information  can  be  a  single 
timestamp  or  the  start  and  end  times  of  a  time  interval.  A  FeartureSnapshot  object  also  contains 
an  array  associating  certain  timestamps  to  corresponding  sets  of  attributes.  If  a  data  object 
remains  at  the  same  location  during  a  time  interval,  different  sets  of  data  information  can  then  be 
recorded  in  the  array.  If  a  data  object  holds  a  position  during  only  one  timestamp,  the  array  has 
only  one  entry. 

Each  FeatureSnapshot  has  a  pointer  to  a  FeatureTrajectory  object.  A  FeatureTrajectory 
object  is  a  superstructure  linking  all  the  locations  where  a  specific  object  has  been.  It  contains  an 
array  that  is  sorted  with  respect  to  time  and  associates  timestamps  to  pomters  to  the 
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corresponding  FeatureSnapshot  objects.  If  a  FeatureSnapshot  object  has  a  duration  (i.e.  the 
position  it  represents  is  held  during  a  time  interval),  the  associated  timestamp  m  its 
FeatureTrajectory  is  its  start  time. 

In  the  2-3TR-tree,  range  queries  are  performed  as  dictated  by  the  R-tree  structure  chosen 
for  the  two  indices.  The  FeatureTrajectory  object  permits  the  efficient  retrieval  of  a  data  object’s 
trajectory  from  one  of  its  positions.  Indeed,  once  a  specific  FeatureSnapshot  F  is  obtained,  the 
array  in  the  FeatureTrajectory  pointed  by  F  is  used  to  get  the  successive  positions  of  the  data 
object.  Therefore,  the  2-3TR-tree  handles  combined  queries.  In  addition,  the  FeatureTrajectory  s 
array  allows  retrieving  partial  trajectories,  i.e.  a  data  object’s  positions  during  a  specific  time 
interval. 

The  advantages  of  the  2-3TR-tree  are  summarized  below: 

•  The  2-3TR-tree  handles  valid-time  md  fully  evolving  datasets. 

•  It  supports  dynamic  data  acquisition. 

•  It  uses  a  minimum  overlap  or  a  linear  ordering  insertion/split  strategy  and  offers 
spatio-temporal  discrimination:  therefore,  it  efficiently  retrieves  historical  information. 

•  It  does  not  involve  version  duplication. 

4.  Performance  Evaluation 

This  section  strives  to  give  a  performance  evaluation  for  the  2-3TR-tree.  We  are  interested 
in  evaluating  the  cost  of  historical  information  retrievals  (spatio-temporal  range  queries),  smce 
trajectories  will  be  straightforwardly  retrieved  from  historical  information  retrievals  by  using  the 
modified  data  structure  of  the  2-3TR-tree. 


The  cost  model  for  the  2-3TR-tree  will  be  derived  using  the  following  notations: 


Symbol 

Definition 

R 3d 

Three-dimensional  R-tree 

R-2d 

Two-dimensional  R-tree 

f3d 

Average  three-dimensional  R-tree  node  fanout 

fid 

Average  two-dimensional  R-tree  node  fanout 

hu 

Height  of  the  three-dimensional  R-tree 

h2d 

Height  of  the  two-dimensional  R-tree 

N 

Number  of  data  indexed  (N  =  N3d  +  N2d) 

N3d 

Number  of  data  indexed  in  the  three-dimensional  R-tree 

N2d 

Number  of  data  indexed  in  the  two-dimensional  R-tree 

Dm 

Density  of  data  indexed  in  the  three-dimensional  R-tree 

Did 

Density  of  data  indexed  in  the  two-dimensional  R-tree 
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D 3d,l 

Density  of  node  MBBs  at  level  1  in  the  three-dimensional  R-tree 

D2d,i 

Density  of  node  MBRs  at  level  1  in  the  two-dimensional  R-tree 

qk 

Average  extent  of  a  query  rectangle  q  on  dimension  k 

S,k 

Average  extent  of  data  MBBs  on  dimension  k 

tstart 

represents  the  minimum  of  the  start  times  of  the  data  indexed  in  R3d 

tmax,3d 

represents  the  maximum  of  the  end  times  of  the  data  indexed  in  R3d 

tmin,2d 

represents  the  minimum  of  the  start  times  of  the  data  indexed  in  R2d:  it  is  the  start  time  of 
the  position  that  has  been  held  the  longest 

Xq, 

X*  =1  if  qt  n  (W  tfnax,3d]  ^  0  and  0  otherwise 

q 

Query  window  q  =  (qx  ,qy,  qt) 

Yqt 

Yqt  =  1  if  qt  0  [tmin>2d,  CurrentTime]  f  0 

NA_total(R,q) 

Number  of  node  accesses  for  a  selection  query  between  the  tree  R  and  the  query  wmdow  q 

An  extension  of  the  cost  model  outlined  in  [TS96]  for  multi-dimensional  range  queries  has  been 
used  to  derive  an  analytical  cost  formula  NAjotal(2-3TR-tree„q, )’’  for  our  newly  proposed 
2+3TR-tree.  This  cost  formula  is  shown  below.  For  additional  information  the  interested  reader  is 


With 


Nu  *= 1 

( 


*=1 


D 


3d,M 


1  + 


V 
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U2d,l  1 
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( 
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f: 


1/2 
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(2) 

(3) 


These  expressions  specify  the  query  cost  for  uniform  distributions.  For  non-uniform  data 

distributions,  the  following  modifications  have  to  be  performed. 

.  The  average  densities  DW)  and  D2d.o  are  replaced  by  the  actual  densities  D’3d,o  and 
D  ’2d,o of  the  data  set  within  the  area  of  the  specified  query  window  q. 
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•  The  amount  N3d  and  N2d  are  replaced  by  a  transformation  N'3d  and  N’2d  computed  as 
follows: 


D  ' 

~  ~  ~/V3 d 


D , 


AT  '= 

ly2d 


D 


2d,0 


D , 


■JV. 


2d 


(4) 


'3rf,0  2d>° 

is  the  density  in  the  two-dimensional  space  of  the  points  indexed  under  the  two- 
dimensional  R-tree.  D3d0  is  the  density  in  the  three-dimensional  space  of  the  points  and  segments 
indexed  under  the  three-dimensional  R-tree.  The  segments  correspond  to  the  positions  that  are 
hed  during  several  timestamps,  so  they  are  all  oriented  along  the  time  axis. 


5.  Benchmarking 

This  section  presents  the  different  tests  performed  to  benchmark  the  2-3TR-tree.  First,  we 
examine  its  behavior  for  uniformly  distributed  datasets  and  then,  for  non-uniform  datasets. 
Finally,  we  compare  the  2-3TR-tree  to  other  spatio-temporal  indices  that  have  been  implemented 

in  [NST99]. 

We  first  focus  on  the  parameters  in  the  cost  model  that  are  related  to  the  dataset  s 
characteristics,  namely  Nsd  and  N2d- 


rambof  of  data  points 


Figure  1:  Cost  evaluation  for  N2  =  10%  N 

In  the  first  series  (Figure  1),  we  also  make  the  fanouts  of  the  two-dimensional  and  the 
three-dimensional  R-trees  vary.  Indeed,  the  fanouts  depend  on  the  pages  size  and  the  size  of  the 
data  to  be  indexed.  For  practical  reasons,  we  assume  that  the  fanouts  of  the  two-dimensional  and 
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the  three-dimensional  R-trees  are  the  same:  f2d  =  f3d-  In  this  series,  the  fanouts  2  vary  between  3 
and  69.  The  total  number  of  indexed  positions,  (N  =  N2d  +  N3d),  ranges  from  1000  to  100,000,  N2d 
=  10%  N.  The  query  window  size  is  set  to  5%  of  each  of  the  axis,  and  its  time  interval  overlaps 
both  R-trees.  We  make  the  assumption  D2(i,o  -  D3d,o  =  0.  This  corresponds  to  densities  computed 
in  the  context  of  grids  with  infinitely  small  cells  (high  definition)  and  involves  a  lower  cost  than 
if  we  assigned  strictly  positive  densities. 


Figure  2:  Cost  evaluation  for  N2  varying 

The  cost  drastically  increases  when  the  fanout  decreases  and  the  number  of  indexed 

increases:  this  is  because  the  height  of  both  R-trees  increases. 

To  obtain  more  legible  and  realistic  results,  f2d  and  f3d  are  each  assigned  a  different  value 
(Figure  2).  We  consider  pages  of  1024  bytes.  Objects  characteristics  in  the  three-dimensional  R- 
tree  are  a  set  of  values  (id,  x,  y,  ti,  tj),  and  objects  characteristics  in  the  two-dimensional  R-tree 
are  a  set  of  values  (id,  x,  y,  ti).  So,  if  4  bytes  per  number  are  allocated,  for  the  three-dimensional 
R-tree  M  =  1024  bytes/(4  bytes  *  5)  =  50,  and  for  the  two-dimensional  R-tree  M  =  1024  bytes/(4 
bytes  *  4)  =  64.  So  f2d  =  42  and  f3d  =  33.  The  other  parameters  are  set  as  in  the  previous 

experiment. 

A  larger  proportion  of  points  indexed  under  the  two-dimensional  R-tree  involves  more 
node  accesses.  This  is  due  to  the  fact  that  in  the  two-dimensional  R-tree  there  is  no  temporal 
discrimination.  Thus,  a  range  query  performed  in  the  two-dimensional  R-tree  requires  the 
scanning  of  all  the  positions  that  fall  within  a  spatial  window  and  also  requires  the  checking  of 


2  67%  M  (maximum  node  capacity)  is  a  typical  fanout  value  for  R-trees  and  variants;  here  the  values  taken 
by  the  fanouts  correspond  to  M  varying  from  5  to  103. 
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whether  these  positions  also  fall  within  the  query  time  interval.  Therefore,  the  processing  cost 
increases. 

In  this  section,  we  are  interested  in  the  influence  of  the  query  window  size.  The  query 
window  size  is  given  relatively  to  the  workspace.  For  instance,  qx  =  0.4  means  that  qx  corresponds 
to  40%  of  the  x-axis.  We  make  the  assumptions  D2d,o  =  D3dfi  =  0,  N=1 00,000  positions  and  N2d  = 
10%  N,  and  f2d  =  42  and  f3d  =  33.  We  assume  that  the  query  timestamps  or  time  interval  overlaps 
both  R-trees.  This  corresponds  to  one  of  the  worst  cases  defined  in  1.3.  The  resulting  query 
performance  should  be  lower  than  if  we  assumed  that  the  query  time  interval  overlaps  only  one  of 
the  trees.  Moreover,  for  timestamp  queries,  qx  =  qy  =  query  window  size  and  qt  =  0;  for  time 
interval  queries,  q,=  qx  =  qy  =  query  window  size. 


Figure  3:  Cost  evaluation  for  q  varying  and  f2d  =  42  and  fM  -  33 

As  expected  (Figure  3),  the  larger  the  query  window,  (which  corresponds  to  a  cube  for 
time  interval  queries),  the  greater  number  of  node  accesses.  This  graph  shows  high  query  costs; 
this  is  because  large  query  windows  (up  to  50%  of  each  axis)  are  considered. 

Next,  we  generate  a  non-uniformly  distributed  dataset  of  points.  The  workspace  is  made 
up  of  100*100*100  cells;  this  defines  the  granularity.  1 1,000  positions  are  indexed:  10,000  under 
the  three-dimensional  R-tree  and  1,000  under  the  two-dimensional  R-tree.  Thus  for  a  uniform 
distribution,  the  densities  would  be  defined  as  follows: 

D2d,o  =1000/(100*100)  =  0.1 
Did0  =10000/(100*100*100)  =  0.01 


13 


The  2-3TR-tree,  A  trajectory-Oriented  Index  Structure  for  Fully  Evolving  Valid-Time  Spatio-Temporal 
Datasets 

Several  query  windows  of  the  same  size  are  computed.  For  each  one  we  count  the 
number  N’  of  the  non-uniformly  distributed  points  that  are  inside  the  window,  and  we  derive  the 
density  of  the  points  in  the  query  window  D^o-  Then,  we  evaluate  each  query  cost  using. 

N’3d  =  D’3d,o/  D3d,o*N3d  in  formula  (17).  Moreover,  we  assume  that  the  number  of  points  mdexed 
under  the  two-dimensional  R-tree  and  to  be  retrieved  by  the  query  corresponds  to  10%  N\  Finally 
the  costs  obtained  for  all  the  queries  of  the  same  size  are  averaged  and  compared  to  the  cost  in  a 
uniform  distribution. 

We  now  turn  our  attention  to  time  queries.  We  first  examine  point  queries,  i.e.  qt  -  0. 


Figure  4:  Cost  evaluation  for  timestamp  queries  in  uniform  and  non-uniform  distributions. 

As  expected  (Figure  4),  for  both  distributions  the  query  cost  increases  with  the  query 
window  size.  Moreover,  depending  on  the  densities  of  the  queried  regions,  the  query  cost  for  non- 
uniform  distributions  will  be  lower  or  higher  than  the  cost  for  uniform  distributions.  Indeed,  let  us 
consider  a  non-uniformly  distributed  dataset  with  N  positions  of  average  density  D.  According  to 
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Figure  5:  Cost  evaluation  for  time  interval  queries  in  uniform  and  non-uniform  distributions. 

the  model  developed  in  section  4,  performing  a  range  query  in  a  region  W  of  density  D’>D  is 
equivalent  to  applying  a  range  query  in  W  to  a  dataset  of  N’  uniformly  distributed  positions  of 
density  D\  where  N’=N*D’/D  and  thus  N’>N.  Therefore,  the  resulting  query  cost  is  higher  than 
the  cost  of  a  range  query  performed  in  W  and  applied  to  a  dataset  with  N  uniformly  distributed 
positions  of 

density  D.  The  difference  between  the  query  costs  for  uniform  and  non-uniform  distributions 
increases  with  the  difference  in  densities  within  the  non-uniformly  distributed  dataset. 

Figure  5  shows  the  cost  evaluation  for  time  interval  queries  in  uniform  and  non-uniform 
distributions.  The  query  cost  for  the  non-uniform  distribution  is  slightly  lower  than  that  of  the 
uniform  distribution.  It  may  be  due  to  the  fact  that  a  majority  of  the  query  window  samples  fell 
within  regions  of  lower  densities.  We  can  also  notice  that  the  larger  the  query  window,  the  less 
relative  difference  there  is  between  the  two  costs.  Moreover,  the  standard  deviation  of  the  non- 
uniform  distribution  query  cost  samples  tends  to  decrease  when  the  query  window  size  is 
increased.  This  can  be  explained  by  the  fact  that  the  larger  the  query  window  size,  the  closer  the 
average  density  in  this  window  is  to  the  average  density  of  the  dataset  (which  represents  the 
density  of  the  uniform  distribution). 

6.  Comparison  with  Other  Spatio-Temporal  Structures 

In  this  section,  the  2-3TR-tree  is  compared  to  the  3D  R-tree  (version3),  the  2+3R-tree  , 
and  the  HR-tree.  This  comparison  is  based  on  the  results  obtained  in  [NST99]  for  uniform 
distributions.  The  experiments  were  run  using  100,000  points  moving  over  100  timestamps.  f2d  - 
42  and  f3d  =  33.  In  the  cost  model  for  the  2-3TR-tree,  we  make  the  assumption  D2d,o  =  D3d.o  =  0. 
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Timestamp  queries 


Figure  6:  Cost  comparison  for  q  =  [0.04  0.04  0] 


As  expected,  for  all  the  structures  the  query  cost  increases  when  the  query  window  increases 


(Figures  6  &  7). 


Figure  7:  Cost  comparison  for  q  =  [0.1  0.1  0] 

For  timestamps  queries  (Figures  6  &  7),  the  query  costs  involved  by  a  structure  should  be 
constant  for  any  timestamp,  since  the  distribution  is  uniform.  This  is  true  for  all  the  structures, 
except  for  the  2+3R-tree  and  the  2-3TR-tree:  at  t  =  100,  the  query  cost  decreases;  this  is  due  to 
the  fact  that  the  three-dimensional  of  the  two  R-trees  is  not  traversed  because  there  is  no  object 
with  an  end  time  equal  to  100,  thus  all  the  answers  come  from  the  2D  R-tree  (where  there  is 

always  a  current  version  for  all  indexed  points). 

Even  if  we  consider  the  10%  error  rate  of  the  cost  model,  the  2-3TR-tree  appears  to  be 
better  than  the  two  other  valid-time  structures,  the  3D  R-tree  and  the  2+3R-tree.  This  good 
performance  of  the  2-3TR-tree  as  compared  to  the  2+3R-tree  can  be  explained  by  its  reduction  of 

overlapping. 
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Figure  8:  Cost  comparison  for  q  =  (0.04  0.04  length/100] 

However,  the  HR-tree  requires  substantially  smaller  query  processing  time  in  all  cases.  Indeed, 
when  the  temporal  part  of  the  query  is  a  point,  the  tree  that  corresponds  to  that  timestamp  is 
obtained  and  the  query  processing  is  exactly  the  same  one  of  a  containment  query  in  a  single 
standard  R-tree.  On  the  other  hand,  for  the  3D,  2+3  R-tree,  and  the  2-3TR-tree,  the  whole 
structure  is  involved  in  the  query  processing,  thus  increasing  significantly  the  query  processmg 

cost. 

Time  interval  queries 

Length  represents  the  length  of  the  time  interval  of  the  query  and  is  measured  in  number  of 
consecutive  timestamps. 


Figure  9:  Cost  comparison  for  q  =  (0.1  0.1  length/100] 

Once  again,  as  anticipated,  for  all  the  structures,  the  query  cost  increases  with  the  spatial 
range  and  the  time  interval  of  the  query  (Figures  8  &  9). 
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We  can  notice  that  the  HR-tree  loses  its  relative  advantage  relatively  fast  with  the 
increase  in  the  queried  interval  length.  Indeed,  the  HR-tree  is  well  suited  to  search  a  time  point, 
for  time  intervals  it  has  to  traverse  as  many  logical  R-trees  as  many  indexed  time  points  are 
covered  by  that  interval.  This  phenomenon  is  worsened  by  the  increase  in  the  query  window  area. 

Moreover,  the  2+3R-tree  loses  its  advantage  as  compared  to  the  3D  R-tree  because  its 
two-dimensional  R-tree  is  searched  and  this  latter  does  not  provide  temporal  discrimination. 

Query  costs  involved  by  the  2-3TR-tree  remain  lower  than  those  of  the  3D  R-tree.  This 
can  be  due  to  the  error  involved  by  the  cost  model  and  to  the  fact  that  the  2-3TR-tree  reduces 
overlapping  as  compared  to  the  2+3R-tree  and  the  3D  R-tree  version  3. 

From  these  tests,  the  HR-tree  has  better  timestamp  query  performance  than  the  2-3TR- 
tree;  however,  it  loses  its  advantage  when  the  query  time  interval  increases,  and  it  cannot  handle 
valid-time  databases.  It  also  appears  that  the  2-3TR-tree  performs  range  queries  at  least  as  well  as 
the  two  other  valid-time  index  structures,  namely  the  2+3R-tree  and  the  3D  R-tree  version  3. 
Moreover,  the  2-3TR-tree  supports  dynamic  data  acquisition,  unlike  the  3D  R-tree  and  the  HR- 
tree,  and  handles  trajectory  retrievals,  whereas  none  of  the  other  tested  structures  does. 

7.Conclusion 

This  paper  first  presented  existing  access  methods  for  spatial  data.  From  this  discussion,  we 
pointed  out  that  the  R-tree  and  its  derivatives  were  the  most  efficient  structures  for  indexing  non- 
uniformly  distributed  data.  That  is  why  we  then  focused  on  and  described  spatio-temporal  indices 
based  on  R-trees. 

A  new  taxonomy  that  classifies  spatio-temporal  access  methods  regarding  to  dataset 
characteristics  and  query  requirements  was  defined.  This  new  taxonomy  classifies  any  existing  R- 
tree  structure  as  a  point  in  a  8-dimensional  space.  According  to  this  taxonomy,  we  specified  the 
requirements  involved  by  a  type  of  applications  that  could  not  be  handled  by  any  existing  spatio- 
temporal  index  structure.  We  then  designed  a  spatio-temporal  access  method,  the  2-3TR-tree, 
based  on  the  2+3R-tree  and  the  TB-tree  that  meets  these  requirements.  More  particularly  the  2- 
3TR-tree  handles  valid-time  and  fully  evolving  datasets;  it  supports  dynamic  data  acquisition;  it 
uses  a  minimum  overlap  or  a  linear  ordering  insertion/split  strategy  and  offers  spatio-temporal 
discrimination:  therefore,  it  efficiently  retrieves  historical  information;  moreover,  it  does  not 
involve  version  duplication. 

Finally,  in  order  to  benchmark  this  new  structure,  an  analytical  cost  model  based  on 
an  existing  cost  model  for  R-trees  was  defined.  The  2-3TR-tree  was  compared  to  other 
spatio-temporal  indices  and  its  behavior  was  analyzed.  From  these  tests,  the  2-3TR-tree 
appears  to  perform  well  in  historical  information  retrievals  as  compared  to  the  HR-tree, 
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the  3D  R-tree  version  3,  and  the  2+3R-tree.  Moreover,  it  handles  trajectory  retrievals 
whereas  the  other  tested  structures  do  not. 
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USEFUL  HURRICANE  INFORMATION 
HURRICANE  ANNOUNCEMENTS: 


1.  When  authorized,  CNMOC  and  NASA,  will  issue  a  joint  announcement  instructing  employees  when  Stennis 
Space  Center  will  be  closed  or  reopened  for  normal  working  hours.  This  information  will  be  broadcast  on  most 
radio  and  television  stations  within  the  SSC  commuting  area.  Specific  information  on  the  status  of  NRL-SSC  will 
be  available  by  dialing  1-228-688-4010.  Information  is  also  available  by  dialing  into  the  NASA  Emergency 
Operations  Center,  1-228-688-3777. 

2.  Personnel  should  also  listen  to  the  following  TV  and  radio  stations  for  reporting  to  work  instructions: 


RADIO  I  AM  Frequencies! 
WVMI 570 -Biloxi 
WWL  870  -  New  Orleans 
WBSL  1 190  -  Bay  St.  Louis 
WRJW  1320  -  Picayune 
WSDL  1560  -  Slidell 


RADIO  (FM  Frequencies-) 
WQID  93.1  -  Biloxi 
WKKN  97.9  -  Biloxi 
WGCM  101.9  -  Gulfport 
WJOJ  106.3  -  Picayune 
WXHR  106.9  -  Slidell 


TELEVISION 
WWL  4  -  New  Orleans 
WLOX  13  -  Biloxi/Gulfport 
WXXV  25  -  Gulfport 
WDSU  6  -  New  Orleans 
WKRG  5  -  Mobile 


Recent  experience  indicates  that  WWL  870  AM  is  the  most  listened  to  radio  station  for  weather  information 
throughout  the  local  area.  However,  SSC  information  is  broadcast  more  frequently  by  the  local  community  stations. 


HURRICANE  PREPARATIONS: 

1 .  When  a  hurricane  threatens,  all  employees  should: 

•  Keep  abreast  of  official  weather  reports. 

•  Read  your  newspaper  and  listen  to  radio  and  television  for  official  announcements. 

•  Note  the  location  of  the  nearest  Emergency  Shelter. 

•  Pregnant  women,  the  ill  and  infirm  should  call  a  physician  for  advice. 

•  Be  prepared  to  turn  off  gas,  water  and  electricity  where  they  enter  your  home. 

•  Fill  tubs  and  containers  with  water,  typically  one  gallon  -  person  -  day. 

•  Make  sure  your  car  has  a  full  tank  of  gas. 

•  Secure  your  boat.  Use  long  lines  to  allow  for  rising  water. 

•  Secure  moveable  objects  on  your  property. 

•  Board  up  or  tape  windows  and  glassed  areas.  Remove  furniture,  and  other  lawn 
products,  in  the  vicinity  of  windows. 


HURRICANE  EVACUATION: 


•  Leave  as  soon  as  you  can,  follow  official  instructions  only. 

•  Disconnect  all  electrical  appliances  except  refrigerator  and  freezer.  Their  controls  should  be 
turned  to  the  coldest  setting  and  the  doors  kept  closed. 

•  Leave  food  and  water  for  pets.  Seeing  eye  dogs  are  the  only  animals  allowed  inside  the  shelters. 


STOCK  ADEQUATE  SUPPLIES  OF: 

Hand  Tools  (Hammers,  Pliers) 
Tape 


Batteries  (and  radios) 
Flashlights 


First-aid  kit 
Candles  and  matches 
Prescription  medications 
Disinfectant 
Insect  Repellent 
Plastic  drop  cloths 
Boards 
Rope  ** 

Ax  ** 

Containers  for  water 
Water  purification  tablets 


Can  opener  and  utensils 
Canned  meat 

Canned  food,  juices,  soft  drinks 
Gum,  candy 

Charcoal  bucket  and  charcoal 
Fire  extinguisher 
Buckets  of  sand 
Lifejackets 
Hard-top  head  gear 
Ready  cash 


**  Keep  ax  to  cut  an  emergency  escape  opening.  If  you  go  to  the  upper  floors  or  attic  of  your 
home,  take  rope  for  escape  to  lower  levels  when  water  subsides. 


USEFUL  WEB  SITES: 

Louisiana  State  Emergency  EVACUATION  ROUTE  MAPS: 

http-//www.dotd.state-1a.us/maps/state.pdf 

Mississippi  Emergency  Management  Agency  information  web  page: 

http://www.memaorg.com/hurricanepreparedness.htai 


Southern  Mississippi  county  EVACUATION  ROUTE  MAPS: 

http://www.memaorg.com/hancockcountvevac.ipg 
h  ttp-  //www.memaorg .  com/harri  soncountvevac  .jpg 
http://www.memaorg.com/iacksoncountvevac.ipg 


